{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Problem set**: http://cs229.stanford.edu/materials/ps0.pdf\n",
    "\n",
    "Suppose all vectors are column vectors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\nabla_x (b^Tx) = b$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculating $\\nabla_x (x^TAx)$, the hard way"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's look at $x^TAx$ by expanding it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "x^TAx &= x^T(Ax) \\\\\n",
    "&= x^T\n",
    "\\begin{bmatrix}\n",
    " R_1 x \\\\ \n",
    " R_2 x \\\\\n",
    " \\cdots \\\\\n",
    " R_n x\n",
    "\\end{bmatrix} \\\\\n",
    "&= x^T\n",
    "\\begin{bmatrix}\n",
    " \\sum_{j=1}^{n} A_{1j} x_j \\\\ \n",
    " \\sum_{j=1}^{n} A_{2j} x_j \\\\\n",
    " \\cdots \\\\\n",
    " \\sum_{j=1}^{n} A_{nj} x_j\n",
    "\\end{bmatrix} \\\\\n",
    "&= x_1\\sum_{j=1}^{n} A_{1j} x_j + x_2\\sum_{j=1}^{n} A_{2j} x_j + \\cdots + x_n \\sum_{j=1}^{n} A_{nj} x_j \\\\\n",
    "&= \\sum_{i=1}^n \\sum_{j=1}^{n} x_i A_{ij} x_j \\\\\n",
    "&= \\sum_{i=1}^n \\sum_{j=1}^{n} A_{ij} x_i x_j\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, I used $R_k$ to mean a particular row of matrix $A$, equivalent to $a_k^T$ used in the lecture note."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now it comes $\\nabla_x (\\sum_{i=1}^n \\sum_{j=1}^{n} A_{ij} x_i x_j)$, let's look at one $x$ value at a time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial \\sum_{i=1}^n \\sum_{j=1}^{n} A_{ij} x_i x_j}{\\partial x_k} &= \\sum_{j \\neq k}^{n} A_{kj} x_j + \\sum_{i \\neq k}^{n} A_{ik} x_{i} + 2 A_{kk} x_k \\\\\n",
    "&= \\sum_{j \\neq k}^{n} A_{kj} x_j + \\sum_{j \\neq k}^{n} A_{jk} x_{j} + 2 A_{kk} x_k \\\\\n",
    "&= 2 A_{kk} x_k + 2 \\sum_{j \\neq k}^{n} A_{kj} x_j \\\\\n",
    "&= 2 \\sum_{j = 1}^{n} A_{kj} x_j \\\\\n",
    "&= 2 R_k x \\\\\n",
    "\\end{align*}\n",
    "\n",
    "1. All other components independent of $x_k$ are gone, ending up with three parts. The first part is obtained when $x_i$ is fixed to be $x_k$, and the second part is obtained when $x_j$ is fixed to be $x_k$. When both $x_i$ and $x_j$ are $x_k$, we get the third part.\n",
    "\n",
    "2. Then, the equation is rewritten,\n",
    "3. and merged, given that A is symmetic, i.e. $A_kj$ = $A_jk$.\n",
    "4. Finally, it's written in matrix form"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now considering all values in $x$, by stacking $2R_k x$ for all $n$ $k$ values\n",
    "\n",
    "\\begin{align*}\n",
    "\\nabla_x (x^T(Ax)) = \\nabla_x \\sum_{i=1}^n \\sum_{j=1}^{n} A_{ij} x_i x_j = 2 \\begin{bmatrix}\n",
    " R_1 x \\\\ \n",
    " R_2 x \\\\\n",
    " \\cdots \\\\\n",
    " R_k x \\\\\n",
    " \\cdots \\\\\n",
    " R_n x\n",
    "\\end{bmatrix} = 2Ax\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To answer the question in the problem set (with a $\\frac{1}{2}$ in front):\n",
    "\n",
    "$$\\nabla f(x) = \\nabla_x (\\frac{1}{2} x^T A x + b^T x) = Ax + b$$,\n",
    "\n",
    "which is very analogous to the univariate case: \n",
    "$$\\frac{d(\\frac{1}{2}ax^2 + bx)}{dx} = ax + b$$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\nabla_x f(x) = \\nabla_x g(h(x)) = \\begin{bmatrix}\n",
    " \\frac{d}{d h} g(h) \\frac{\\partial}{\\partial x_1} h(x) \\\\ \n",
    " \\cdots \\\\\n",
    " \\frac{d}{d h} g(h) \\frac{\\partial}{\\partial x_n} h(x) \\\\ \n",
    "\\end{bmatrix}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Continued from **1(a)**,\n",
    "\n",
    "$$\n",
    "\\frac{\\partial \\sum_{i=1}^n \\sum_{j=1}^{n} A_{ij} x_i x_j}{\\partial x_k \\partial x_l} = \n",
    "\\frac{\\partial 2R_k x}{\\partial x_l} = 2 \\frac{\\partial \\sum_{j = 1}^{n} A_{kj} x_j}{\\partial x_l} = 2 A_{kl}\n",
    "$$\n",
    "\n",
    "so (again, with $\\frac{1}{2}$ in front and canceled)\n",
    "\n",
    "$$\n",
    "\\nabla^2 f(x) = \\nabla^2 (\\frac{1}{2}x^T A x + b^Tx) = \n",
    "\\begin{bmatrix}\n",
    " A_{11} & A_{12} & \\cdots & A_{1n} \\\\ \n",
    " A_{21} & A_{22} & \\cdots & A_{2n} \\\\\n",
    " \\cdots & \\cdots & \\ddots & \\cdots \\\\\n",
    " A_{n1} & A_{n2} & \\cdots & A_{nn}\n",
    "\\end{bmatrix}  = A\n",
    "$$\n",
    ", which is also analogous to \n",
    "$$\\frac{d^2(\\frac{1}{2}ax^2 + bx)}{dx^2} = a$$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\nabla f(x) = g^{'}(a^Tx) \\cdot a$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\nabla ^2 f(x) = g^{''}(a^Tx)\\cdot(aa^T)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2(a)\n",
    "\n",
    "Part 1.\n",
    "\n",
    "Given $A = zz^T$, so $A^T = z^T z = A$. Besides, $A_{ij} = z_i z_j$.\n",
    "\n",
    "\n",
    "Part 2. Following 1(a)\n",
    "\n",
    "\\begin{align*}\n",
    "x^T A x &= \\sum_{i=1}^n \\sum_{j=1}^{n} A_{ij} x_i x_j \\\\\n",
    "        &= \\sum_{i=1}^n \\sum_{j=1}^{n} z_i z_j x_i x_j \\\\\n",
    "        &= \\sum_{i=1}^n \\sum_{j=1}^{n} (z_i x_i) (z_j x_j) \\\\\n",
    "        &= (\\sum_{i=1}^{n} z_i x_i)^2 \\\\\n",
    "        &\\ge = 0\n",
    "\\end{align*}\n",
    "\n",
    "Because $A = A^T$ and $x^T A x \\ge 0$, so A is positive semidefinite (PSD)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2(b)\n",
    "\n",
    "\\begin{align*}\n",
    "A &= zz^T \\\\\n",
    "  &= \n",
    "\\begin{bmatrix}\n",
    " z_1 z_1 & z_1 z_2 & \\cdots & z_1 z_n \\\\ \n",
    " z_2 z_1 & z_2 z_2 & \\cdots & z_2 z_n \\\\\n",
    " \\cdots & \\cdots & \\ddots & \\cdots \\\\\n",
    " z_n z_1 & z_n z_2 & \\cdots & z_n z_n\n",
    "\\end{bmatrix} \\\\\n",
    "& \\rightarrow \n",
    "\\begin{bmatrix}\n",
    " z_1 z_1 & z_1 z_2 & \\cdots & z_1 z_n \\\\ \n",
    " 0 & 0 & \\cdots & 0 \\\\\n",
    " \\cdots & \\cdots & \\ddots & \\cdots \\\\\n",
    " 0 & 0 & \\cdots & 0\n",
    "\\end{bmatrix} \\\\\n",
    " &= z_1\n",
    "\\begin{bmatrix}\n",
    " z_1 & z_2 & \\cdots & z_n \\\\ \n",
    " 0 & 0 & \\cdots & 0 \\\\\n",
    " \\cdots & \\cdots & \\ddots & \\cdots \\\\\n",
    " 0 & 0 & \\cdots & 0\n",
    "\\end{bmatrix} \\\\\n",
    " &= R \\\\\n",
    "\\end{align*}\n",
    "\n",
    "The $\\rightarrow$ means row transformation by multiplying row $1$ with $-\\frac{z_i}{z_1}$, and add it to row $i$. This is possible because $z$ is given as a _non-zero n-vector_.\n",
    "\n",
    "So the rank of $A$ is $1$.\n",
    "\n",
    "To obtain its nullspace, we need to find all possible $x$ vectors that satisfy $Ax = 0$, Before that, let's consider the dimensions of four fundamental subspaces in linear algebra:\n",
    "\n",
    "\n",
    "|                | Symbol   | Dimension | Note                                                                                |\n",
    "|----------------|----------|-----------|-------------------------------------------------------------------------------------|\n",
    "| Row space      | $C(A^T)$ | 1         | Equal to rank                                                                                    |\n",
    "| Column space   | $C(A)$   | 1         | Equal to rank                                                                       |\n",
    "| Nullspace      | $N(A)$   | n - 1     | Usually the dimension of nullspace is $m - 1$, but given A is symmetric, so $m = n$ |\n",
    "| Left nullspace | $N(A^T)$ | n - 1     | - |\n",
    "\n",
    "Therefore, the nullspace of $A$ consists of $n-1$ vectors, which could be derived from $n-1$ free variables. Result:\n",
    "\n",
    "\\begin{align*}\n",
    "Ax &= 0 \\\\\n",
    "Rx &= 0 \\\\\n",
    "z_1 \\sum_{i=1}^{n} z_i x_i & = 0 \\\\\n",
    "\\sum_{i=1}^{n} z_i x_i &= 0\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore, by setting any of the $n=1$ free variables to $1$, and the rest to $0$, and then calculating a value for the pivot variable ($x_1$), we can obtain $n-1$ independent vectors for $x$, and by multiplying each vector with an arbitrary constant, we obtain the nullspace, i.e. all possible vectors that satisfy $Ax = 0$.\n",
    "\n",
    "\\begin{align*}\n",
    "N(A) &=\n",
    "c_2 \\begin{bmatrix}\n",
    " -z_2/z_1 \\\\ \n",
    " 1 \\\\\n",
    " 0 \\\\\n",
    " 0 \\\\\n",
    " \\vdots \\\\\n",
    " 0\n",
    "\\end{bmatrix} + \n",
    "c_3 \\begin{bmatrix}\n",
    " -z_3/z_1 \\\\\n",
    " 0 \\\\\n",
    " 1 \\\\\n",
    " 0 \\\\\n",
    " \\vdots \\\\\n",
    " 0\n",
    "\\end{bmatrix} +\n",
    "\\cdots +\n",
    "c_i \\begin{bmatrix}\n",
    " -z_i/z_1 \\\\\n",
    " 0 \\\\\n",
    " \\vdots \\\\\n",
    " 1 \\\\\n",
    " \\vdots \\\\\n",
    " 0\n",
    "\\end{bmatrix} + \n",
    "\\cdots +\n",
    "c_n \\begin{bmatrix}\n",
    " -z_n/z_1 \\\\\n",
    " 0 \\\\\n",
    " 0 \\\\\n",
    " 0 \\\\\n",
    " \\vdots \\\\\n",
    " 1\n",
    "\\end{bmatrix} \\\\\n",
    "&= \\sum_{k=2}^n c_k \\begin{bmatrix}\n",
    " -z_k/z_1 \\\\\n",
    " I\\{i=k\\} \\\\\n",
    " I\\{i=k\\} \\\\\n",
    " \\vdots \\\\\n",
    " I\\{i=k\\}\n",
    "\\end{bmatrix}\n",
    "\\end{align*}\n",
    "\n",
    "where $i$ is the row index, $I$ is an indicator function, and $c_k$ is a constant. Note, $k$ starts from $2$ instead of $1$. This is meaningful only when $z_1$ is non-zero."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given \n",
    "\n",
    "1. $A \\in \\mathbb{R}^{n\\times n}$ is PSD, so $A = A^T$, and $x^T A x \\ge 0$, and \n",
    "1. $B \\in \\mathbb{R}^{m\\times n}$\n",
    "\n",
    "Let's denote $C = BAB^T$. We need to prove two parts in order to show that $C$ is PSD, too.\n",
    "\n",
    "Part 1.\n",
    "\n",
    "\\begin{align*}\n",
    "C^T &= (BAB^T)^T \\\\\n",
    "    &= (B^T)^T A^T B^T \\\\\n",
    "    &= B A^T B^T \\\\\n",
    "    &= B A B^T \\\\\n",
    "    &= C\n",
    "\\end{align*}\n",
    "\n",
    "Part 2.\n",
    "\\begin{align*}\n",
    "x^T C x &= x^T (BAB^T) x \\\\\n",
    "        &= (x^T B) A (B^T x) \\\\\n",
    "        &= (B^T x)^T A (B^T x) \\\\\n",
    "        &= y^T A y \\\\\n",
    "        &\\ge 0\n",
    "\\end{align*}\n",
    "\n",
    "Here we transformed $x^T C x$ into one that leverages the property of A, $x^T A x \\ge 0$, by setting $y = B^T x$. Therefore, $BAB^T$ is PSD.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3(a)\n",
    "\n",
    "Given $A = T \\Lambda T^{-1}$, so\n",
    "\n",
    "\\begin{align*}\n",
    "            & A T = T \\Lambda \\\\\n",
    "\\rightarrow & A[t^{(1)} \\cdots t^{(n)}] = [t^{(1)} \\cdots t^{(n)}] \\Lambda \\\\\n",
    "\\rightarrow & At^{(i)} = t^{(i)} \\lambda_i \\\\\n",
    "\\end{align*}\n",
    "\n",
    "The last step is because $\\Lambda$ is a diagonal matrix with all off-diagonal values being zeros. Therefore, \n",
    "\n",
    "$$ At^{(i)} = \\lambda_i t^{(i)} $$\n",
    "\n",
    "$(t^{(i)}, \\lambda_i)$ is an eigenvalue/eigenvector pair."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3(b)\n",
    "\n",
    "##### **Spectral theorem (important!)**: \n",
    "\n",
    "If $A \\in \\mathbb{R}^{n \\times n}$ is **symetric** (i.e. $A = A^T$), then $A$ is **diagonalizable** ($\\Lambda$) by a **real** **orthogonal** ($U$) matrix. In symbols, \n",
    "\n",
    "$$ U^T A U = \\Lambda$$\n",
    "\n",
    "or equivalently,\n",
    "\n",
    "$$ A = U \\Lambda U^T$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given $A = A^T$, and $U$ is orthogonal, and $A = U \\Lambda U^T$, then\n",
    "\n",
    "$$AU = U \\Lambda U^T U = U \\Lambda$$\n",
    "\n",
    "Following the same logic as in **3(a)**, we get\n",
    "\n",
    "$$Au^{(i)} = \\lambda_i u^{(i)}$$\n",
    "\n",
    "so $u^{(i)}$ is an eigenvector."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given that $A$ is PSD, so for any $n$-dim vector $x$, $x^T A x \\ge 0$ (PSD property). Since the eigenvector $u^{(i)}$ is an $n$-dim vector, the PSD property applies for it, too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\forall i,u^{{(i)}^T}Au^{(i)} &= u^{{(i)}^T}\\lambda_i u^{(i)} \\\\\n",
    "&= \\lambda_i \\left|\\left|u^{(i)}\\right|\\right|^2 \\\\\n",
    "&\\ge 0 \n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "since $\\left|\\left|u^{(i)}\\right|\\right|^2 \\ge 0$, $\\lambda_i$ should be $\\ge 0$, too."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
